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Abstract —Rank minimization can be converted into tractable surrogate problems, such as Nuclear Norm Minimization (NNM) and Weighted 
NNM (WNNM). The problems related to NNM, or WNNM, can be solved iteratively by applying a closed-form proximal operator, called 
Singular Value Thresholding (SVT), or Weighted SVT, but they suffer from high computational cost of Singular Value Decomposition (SVD) at 
each iteration. We propose a fast and accurate approximation method for SVT, that we call fast randomized SVT (FRSVT), with which we 
avoid direct computation of SVD. The key idea is to extract an approximate basis for the range of the matrix from its compressed matrix. 
Given the basis, we compute partial singular values of the original matrix from the small factored matrix. In addition, by developping a range 
propagation method, our method further speeds up the extraction of approximate basis at each iteration. Our theoretical analysis shows the 
relationship between the approximation bound of SVD and its effect to NNM via SVT. Along with the analysis, our empirical results 
quantitatively and qualitatively show that our approximation rarely harms the convergence of the host algorithms. We assess the efficiency 
and accuracy of the proposed method on various computer vision problems, e.g., subspace clustering, weather artifact removal, and 
simultaneous multi-image alignment and rectification. 

Index Terms —Singular value thresholding, rank minimization, nuclear norm minimization, robust principal component analysis, low-rank 
approximation 
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1 Introduction 

M inimizing the rank of a matrix can be used as a versatile 
regularizer to derive a low-rank solution and is needed 
in many mathematical models in computer vision and machine 
learning [5], [10], [14], [21], [23], [29], [31], [32], [36], [43], [45], 
As rank minimization, minrank(X) s.t. X g C, is generally 
an NP-hard problem, it is typically relaxed using the nuclear 
norm (i.e., || • ||*, the sum of the singular values), which is a 
tight convex surrogate for the rank function, i.e., rank(-). 
Nuclear norm minimization (NNM) is expressed as 


X* =argminx/(X) + r||X|| 


( 1 ) 


X 


where X G and r > 0 is a regularization parame¬ 

ter. The function /(X) can be arbitrarily defined depending 
on the objectives, e.g., /(X) = ||0 —X||i in robust princi¬ 
pal component analysis (RPCA) [4], /(X) = |||AX —B|||p 
in multivariate regression and multi-class learning [26], 
/(X) = |||7r^(0) — 7r^(X)||fp in matrix completion [4], 
/(X) = ||OX — 0|||. in subspace clustering [21], where O is 
measured data, || • ||i and || • \\f are h and Frobenius norms 
respectively, and 7r^(-) is an orthogonal projection operator 
setting [7rxif{X.)]ij = [X.]ij for (i, j) G and 0 otherwise. 

For the purpose of better approximating the rank function 
rank(-), weighted nuclear norm minimization (WNNM) [10], 
[14], [30], which could be non-convex depending on the 
weights that are used, can be alternatively adopted instead of 
the standard nuclear norm. 

Except for the case that the loss function /(X) is a prox¬ 
imity term, most previous works use first-order optimization 
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approaches, e.g., dual method [8], accelerated proximal gra¬ 
dient [15], augmented Lagrange multiplier (ALM) [18], al¬ 
ternating direction method (ADM) [18], [19], and iteratively 
reweighted least-squares [24]. In the intermediate step, regard¬ 
less of regularization, all these approaches have an iterative 
step to solve a simple NNM, or WNNM, subproblem defined 
as the following nuclear norm and proximity terms: 

Problem (Nuclear norm minimization). For r > 0 and 
A G 

X* = argminx t||X||* + ^||X - AH^^, (2) 

where optimal X* can be obtained by the singular value 
thresholding operator defined as following. 

Definition 1 (Singular value thresholding (SVT)^ [2]). The 

problem (2) has a closed-form solution given by the singular value 
thresholding operator E>r{') as 

X* = §,(A) = Ua>S,(Sa)VI, (3) 

where Sr{x) = sgn(x) • max(|x| — r, 0) is a soft shrinkage opera¬ 
tor [11], and UaXaVa of A. 

The major computational bottleneck of NNM and WNNM 
problems is the necessity of solving Eq. (2) multiple times, 
where SVD computation occupies the largest computation cost, 
i.e., 0{mn min(m, n)) for each SVD [9]. 

This paper proposes a fast SVT technique to accelerate gen¬ 
eral NNM and WNNM computation. Our method is motivated 
by the previous study of a randomized SVD proposed by 
Halko et al. [12], and we extend the original general method in 
several respects for better solving the NNM and WNNM prob¬ 
lems that we focus on in this paper. As a result, we propose an 
algorithm that we cedi fast randomized SVT (FRSVT). We present 

^ A similar result for WNNM can be found in [10], called WSVT. 
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7r^(-) 

Orthogonal projection operator with a map (index set) ^ 

Sr(-) 

SVT operator [2] with the parameter r 


Soft shrinkage operator [11] with the parameter r 


i-th singular value of a matrix 

n 

A standard Gaussian random matrix 

Q 

Orthonormal column matrix 

o 

Observation matrix (Input) 

L,£ 

Low-rank optimization related matrix or tensor 

s,£: 

Sparse optimization related matrix or tensor 

k 

Parameter for the target rank 

P 

Parameter for over-sampling rate 

1 

Sampling (predicted) rate {1 = k p) 

r 

True rank of the target matrix A 

s 

Real sampling rate {s = min(Z, r)) 

T] 

Number of the power iteration 

b 

Parameter for the maximum rank bound 

V 

Parameter vector for the polynomial error bound (v = {k,p} with¬ 
out the power iteration, orv = {k,p,r]} with it) 


TABLE 1: Summary of notations 


the connection between FRSVT and low-rank approximation 
with both theoretical and empirical analyses, and show the 
effectiveness of the proposed method via simulations and a few 
computer vision applications. Table 1 summarizes the notation 
used in the rest of the paper. 

A preliminary version of this paper has appeared in [34]. 
We extend [34] by analyzing the behavior of the proposed 
method in both theoretical and empirical aspects as follows. 
A detailed derivations of error bound on both Frobenius and 
spectral norms, inexpensive and provable accuracy check, and 
comparison with a linear-time approximate SVD [6] are newly 
supplemented in Sec. 3. Based on the analyses, we provide 
theoretically sound implementation tips and other options 
that can potentially improve the performance of the proposed 
algorithm. Additional simulation and experimental results are 
included in Sec. 4. Specifically, this paper makes the following 
contributions: 

• We develop a successive truncated low-rank decomposition 
method that can be generally applied to NNM and WNNM 
problems. Our method achieves high efficiency with rarely 
degrading the accuracy of the host algorithms. 

• By exploiting the proximity of the range space over iterations, 
our method further accelerate the NNM computation by prop¬ 
agating the estimated basis at the previous iteration to the next 
one for acceleration. We call this technique range propagation 
(RP). 

• We provide a theoretical analysis between the low-rank 
approximation and our FRSVT method. We show how the low- 
rank approximation affects to SVT operation as well as effects 
of the interaction with the power iteration and over-sampling. 
In addition, we show the empirical stability and behavior of 
our method with respect to varying parameters. 

• We apply FRSVT to various computer vision applications 
and show the performance gain in comparison with previous 
methods. 

1.1 Related Works 

Candes et al. [4] showed that, under some mild condition, 
the solution of NNM is equivalent to the solution of rank 
minimization in conjunction with a sparse outlier model [4]. 
Inspired by the success of this convex surrogate for rank mini¬ 
mization, low-rank structures have been exploited in various 
computer vision applications, such as rain removal [5], de- 
noising [10], inpainting [14], motion segmentation by subspace 
clustering [21], structure from motion [23], background sub¬ 
traction [29], tag transduction [29], high dynamic range imag¬ 
ing [32], batch image alignment [36], photometric stereo [43], 


image rectification [45], and nuclear norm regularized learn- 
ing [26]. 

While useful, due to the high computational complexity 
of NNM, specifically, SVD used in the SVT operator, a fast 
SVT method has always been wanted in both small- and 
large-scale problems. Liu et al. [22] efficiently solve NNM by 
casting the original convex RPCA problem into a bilinear fac¬ 
torization form. Although this bilinear factorization introduces 
non-convexity in the objective function like other factorization 
methods [7], [46], the work achieved significant speed-up. 
Unlike their method, our method retains the original objective 
function and approximately optimizes it, and is applicable to 
general NNM problems once the problems can be led to the 
form of Eq. (2). 

With retaining the advantage of convexity, Liu et al. [23] 
exactly solve RPCA on a small sub-sampled matrix and propa¬ 
gate the seed solution to other parts via ii filtering. Since both 
Liu et al. [22] and Liu et al. [23] focus only on RPCA problem but 
not general NNM problems, a fast and general SVT method is 
still needed as a tool for NNM to be applied to large-scale prob¬ 
lems. Cai et al. [3] avoid explicit SVD computation using the 
dual of SVT. Since their method uses Newton iterations with 
an inverse matrix, the input matrix needs to be preprocessed 
by the complete orthogonal decomposition [9] (COD) to ensure 
a non-singular square matrix. The standard COD consists 
of twice of QR decompositions with column/row pivoting, 
which requires 0{mn min(m,n)); therefore, the reduction of 
computation complexity is still limited. 

In other thread of works, it has been shown that the 
exact SVD computation is unnecessary in the inner loops of 
NNM [18], [26], [29]. Mu et al. [29] propose a compressed 
optimization by random projection. Ma et al. [26] solve NNM 
related problems with a linear-time approximate SVD [6]. 
However, these methods become occasionally unstable, be¬ 
cause the input matrix is directly approximated by sampling 
or projection, where the original information is impaired, and 
the randomness leads to unstable and incorrect results. Unlike 
these methods, our method only approximates the subspace 
bases to guarantee that most spectrum information is retained. 
Scope of this work This work mainly focuses on the be¬ 
havior of the fundamental core module, i.e., SVT operator, 
combined with low-rank approximation. All the provided 
theoretical analyses in this work are mainly focused on the 
FRSVT operator per se, rather than the overall optimization 
procedure of low-rank optimization problems. In rank mini¬ 
mization literature, there are approaches [28], [33], [41] that are 
tightly entangled with the optimization procedure to promote 
low-rank solutions, and they are beyond SVT like modular 
operation. This paper does not cover this type of approaches, 
yet focuses on the SVT operation that is used broadly [5], [10], 
[14], [21], [26], [32], [36], [43], [45]. 

2 Fast Randomized Singular Value Thresh¬ 
olding (FRSVT) 

The basic idea of our method shares the idea of Liu et al. [23] 
and Ma et al. [26], in that the solution of Eq. (2) can be found 
by applying SVT to a small matrix instead of the original large 
matrix as illustrated in Fig. 1. Instead of sampling columns 
or rows of a matrix as in [23], [26], our method extracts 
a small core matrix by finding orthonormal bases with the 
unitary invariant property. Specifically, since the NNM defined 
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Fig. 1: Basic idea of our method. Instead of applying SVT to the large 
original matrix, if we can obtain the same result by applying SVT to small 
matrix with few additional efforts, the complexity could be significantly drop. 
As shown in Proposition 1, once a large matrix is decomposed into a core 
matrix and orthonormal matrices (the left and right thin matrices in the 
illustration), by applying SVT to the small core matrix we can obtain the 
same result with the direct SVT computation on the large matrix. 


in Eq. (2) consists of unitary invariant norms, the following 
equality holds: 

Proposition 1. Let A = QB G where Q G has 

orthonormal columns. Then, 

S,(A) = QS,(B), (4) 

where S>r{') is the SVT operator. 

Proof. When m = n and Q is an orthonormal matrix, the equal¬ 
ity obviously holds by the unitary invariant property of norms. 
For m> n, we prove the equality for the orthonormal column 
matrix. Consider an arbitrary matrix Z = Uz5]zVj G 
where Vz^z^z SVD, then 

Q • §-r(B) = Q • argmin ^r||Z||* + i||Z - 

= argmin ^t||Q'^X||* + i||Q(Z - B)|||.^ 

(by letting QZ = X, and as Q^Q = I, 
and the unitary invariant norm property) 

= argmin (^r||X|U + i ||X - QB|||) 

(since IIQ^XIU = ||Z|U = IIQZIU = ||X||*) 

= S^(QB) =S^(A). 

□ 

In general, SVT requires SVD computation, and its com¬ 
plexity^ is 0{mn?). Based on Proposition 1, we can avoid 
expensive computation by instead computing SVT on a smaller 
matrix B G when Q G is available. Given 

Q G that best approximates A by a rank-/c matrix, the 

complexity of SVD of B G becomes 0{nkf). Therefore, 

when k n, the computation speed can be significantly 
improved. 

Our SVT computation iterates the following two steps: 1) 
Estimating an orthonormal column matrix Q, and 2) Com¬ 
puting SVD of B for SVT. For SVT computation, a partial (or 
truncated) SVD is frequently used to reduce the complexity in 
many prior arts. Our method similarly finds a rank-/c approx¬ 
imation {k < n) oi the original matrix A as A ^ = QB. It 

saves the computation of the first step as well as the second 
step, because the size of matrices Q and B are reduced to 
m X k and k x n, respectively. By exploiting the observa¬ 
tion that the major orthonormal k bases evolve slowly over 
iterations, our method efficiently initializes matrix Q at each 
iteration by bypassing expensive random range estimation 
(Range propagation in Sec. 2.1). In addition, by avoiding direct 

^Without loss of generality, we assume m > n in this paper. 


Algorithm 1 Fast Randomized Singular Value Threshold¬ 
ing (FRSVT) algorithm 

1: Input : A G > 0,1 = k p > 0 and q > 0. For 

range propagation, the orthonormal column matrix Q of the previous 
iteration. 

2 : 

3: if not Range propagation then 

4: Sample Gaussian random matrix Cl G ^ ^ 

5: Y = ACt 

6 : Q = QR_CP(Y) 

7: else 

8 : Sample Gaussian random matrix Cl G 

9: Y = ACl 

10: Qy = PartialOrthogonalization(Q, Y) 

11 : Q = [Q,Qy] 

12: end if 
13: repeat 

14: Q = QR(AA^Q) 

15: until 77 times 

16: [H,C] = QR(A^Q) 

17: [W, P] = PolarDecomposition(C) 

18: [V, D] = EigenDecomposition(;0 
19: Sr (A) = (QV) <Sr(D) (HWV)^ 

20 : 

21: Output : Sr (A), Q = QV 


SVD computation, we can further reduce the computation as 
described in Sec. 2.2. Also, we describe a target rank prediction 
technique for further improving speed in Sec. 2.3. 

2.1 Finding Approximate Range 

Inspired by Halko et al. [12], we first estimate the or¬ 
thonormal bases Q = [qi,*** 7 qd (where I > k) such that 
span(Q) C Range(A)^ from a matrix compressed by random 
projection. Intuition of their randomized range finding algo¬ 
rithm is as follows. By multiplying a random vector ujj, a 
random linear combination of the column vectors of A 
is generated, which encodes the partial range of A. Suppose 
A = A/e + E, where Ak is the rank-Zc projection of A, whose 
range is the target to be captured, and E represents small 
perturbation, then sample vector can be obtained by 

yj=AkWj^^Wj. ( 5 ) 

Even though unwanted E may be included in {y^ } because 
the action of Ak {i.e., magnitudes of spectrum) is larger than E, 
the range of Ak is dominant to be captured in {yj}. However, 
if only k vectors {y^} are sampled, {y^} could not span 
the entire Range (A/.). By increasing the sampling rate, most 
of Range(A/c) can be captured; therefore, we oversample I 
sample vectors, so that Range (A/c) can be captured as much 
as possible. 

Given the sample matrix Y = [yi, • • • yi], where I = k p, 
Q can be obtained by orthonormalizing Y. When 
r = rank(A) < I, r bases are enough to span the entire 
Range(A). Indeed, when rank(A) < I, the range finding al¬ 
gorithm is fairly accurate and close to the exact method as 
we will see the theoretical analysis in Sec. 3. Thus, we can 
reduce the dimension of Q for almost free by estimating the 
rank and dominant bases by QR decomposition with Column 
Pivoting (QR-CP) to Y. While Halko et al. also proposed 
an algorithm to adaptively and approximately determine the 
number of bases by different randomization for sampling, 
orthogonalization, and re-orthogonalization, our method is 
based on an exact method with the same complexity using 

^We only consider the column range space of a matrix for simplicity of 
explanation, but the row range also can be used. 
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Fig. 2: Illustration of singular value decaying. [Left] Gaussian random, video 
and image samples. [Right] Decaying graphs of singular values. The Red, 
Green, Blue lines represent the graphs of a Gaussian random matrix, video, 
and image, respectively. The spectrum of visual data decays significantly 
fast. 



Fig. 3: Angular difference of subspaces between subsequent iterations. 


QR-CP, and we adaptively predict the sampling rate in a 
simpler manner (we will see in Sec. 2.3 or Sec. 3.2). Fortunately, 
in LAPACK, an efficient QR-CP routine using level-3 BLAS 
(dgeqpS) is available. Moreover, our method does not require 
the upper triangle matrix from QR, but only the orthonor¬ 
mal basis Q; therefore, we can avoid extracting the whole 
triangular matrix but only compute rank and Q with dgeqp3 
and orgqr routines, respectively. After obtaining Q G 
where s = min(/,r), we can compute a small matrix B by 
B = Q^A G 

Range Propagation (RP) for Fast Range Finding As shown 
in Fig. 3, we observe that Range(A(^)) at the i-th iteration is 
similar to the one at the {i — l)-th iteration in NNM related 
problems. This motivates us to use the singular vectors at 
the (i — l)-th step as an initial approximation of range bases 
at the Ath step. A similar observation is exploited in 
[20]. To capture the change of the range space, we addi¬ 
tionally sample p sample vectors {y}. We append the sam¬ 
pled {y} to the previous singular vector matrix Q(i_i) as 
Q(i) = [Q(i-i),yiG • • ,yp]/ and apply partial orthogonaliza- 
tion only for newly added {y} by a modified Gram-Schmidt 
procedure [9]. The number of bases can subsequently be re¬ 
duced by checking the rank with QR-CP in the first step of the 
power iteration or by a method described in Sec. 3.2. 

Power Iteration Among the overall process in our algorithm, 
since the only approximation step is estimation of the orthonor¬ 
mal column matrix Q, the accuracy of our algorithm depends 
only on this step. In Eq. (5), if the magnitude of the action of 
A/e is not dominant against E, the directions of sample vectors 
are biased and may be affected by portions of Range(A/e)^. 
This introduces accuracy degradation to the rest of the process. 
To resolve this issue, Halko et al. [12] used a power iteration 
scheme, which makes the spectrum difference between A/^ and 
E larger by estimating Q on (AA^)^ A. It improves the chance 
of better capturing the range of A/^ from Y = (AA^)^Ar2, 
while the singular vectors remain unchanged. Halko et al. also 
showed that = 2 or 4 power iterations are sufficient for 
usual data of interest, and highly accurate range finding can 
be achieved. As shown in Fig. 2, decay of singular values of 
visual data is much faster than the one of a Gaussian random 
matrix. Our empirical tests also show that 77 = 2 is sufficient 


enough, and it is used in all our experiments. 

2.2 Computing the Singular Values (Vectors) 

The NNM problem is now reduced to SVT on a smaller 
matrix B. In this section, we further reduce the computation 
time of SVT on B. The SVT operator can be computed by 
SVD and shrinkage on its singular values. For positive semi- 
definite matrices, SVD can be more efficiently computed by 
Eigen decomposition (ED), which is typically faster than SVD 
in our empirical tests with small matrices. To apply ED to a 
general matrix, we form a positive semi-definite matrix by the 
following decomposition: 

Definition 2 (Polar decomposition [13]). Let X G 

m>n. There exists a matrix W G and a unique Hermitian 

positive semi-definite matrix P G such that 

X = WP, W*W = I, 

where I is the identity matrix. If rank(X) = n, then P is positive 
definite and W is uniquely determined. 

Note that the existence of polar decomposition is equivalent 
to the existence of SVD. 

We use a Newton based polar decomposition suggested by 
Higham et al. [13], which has a quadratic convergence behavior. 
In our experiment, it converges at a small number of itera¬ 
tions (typically, seven) with various different data, which is 
consistent with the result of [3], [13]. Due to the requirement of 
the inverse operator in Newton iterations, it is only applicable 
to non-singular square matrices. Since B^ G R’^^^ is a full 
column rank matrix, the non-singular square matrix can be 
simply obtained from B^ = HC by QR decomposition, where 
we call C G R^^^ a core matrix that is always non-singular 
and square. For this step, unlike the procedure in Sec. 2.1, no 
column pivoting is required. 

We sequentially apply the polar decomposition and ED on 
the core matrix; C = WP = WVDV^, where D and V are 
the eigenvalue and eigenvector matrices of P, respectively. 
Since the matrices H, W, and V are orthonormal column 
matrices, the diagonal matrix D is equivalent to the singular 
value matrix of B. Finally, St-(A) can be approximated by 

§,(A) « S,(A,) = (QV) Sr(D) (HWV)T. (6) 

For the range propagation, the singular vector matrix Q is 
stored as Q = QV or HWV (according to the side of random 
matrix multiplication). Overall algorithm is summarized in 
Algorithm 1. 

2.3 Adaptive Rank Prediction (AP) Heuristic 

For SVT, only singular vectors corresponding to the singular 
values that are greater than a certain threshold are needed, and 
full SVD is unnecessary. Since the rank of A(^^) is unknown 
before SVD, predicting its rank can avoid unnecessary compu¬ 
tation. We observe that, in many NNM related problems, the 
rank of A(^^) tends to monotonically increasing or decreasing 
over iterations, and the rank is stabilized as the number of 
iteration increases. As we shall see in the theorem of error 
bound in Sec. 3, over-sampling is always useful to reduce the 
expected error bound of FRSVT. Thus, optimistically predict¬ 
ing rank allows to achieve both computational efficiency and 
stability. 
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The efficiency of our method may be degraded by an 
excessively high sampling rate. In such a case, we resort to 
the truncated SVT by upper bounding the target rank. As 
shown in natural image statistics of Fig. 2, the rank of 
is generally stabilized at low-rank in many computer vision 
application. Usually, the final accuracy is not harmed, as seen 
in the successes of the truncated SVD in the NNM related 
problems [18], [19], [20], [22]. 

Based on these observations, we define over-sampling rate 
p as: 

_ j a, iin < k, 

p%+i — ^ \pn\^ otherwise, ^ ^ 

where m > n is assumed, a is a constant (set to a = 2), 
p G ( 0 , 1 ] is a constant parameter to rapidly follow the real 
rank (set to p = 0.05 or less), and [•] denotes the ceil 
operation. The prediction rule for sampling rate is defined 
as /i+i = min(ri + Pi+i, h), where b = [yn] is the maximum 
bound of sampling rate, 7 G ( 0 , 1 ] is the proportion parameter. 
The sampling rate li can be regarded as the predicted rank 
at the i-th iteration, and Vi is the number of singular values 
of A 7 ) that are larger than the threshold, i.e., the estimated 
rank of St-(A(^)). Initially, we set Iq = 0.1b. In the case of the 
range propagation, the number of columns in becomes 

Vi, and Pi = li — Vi. When < li, Eq. (7) slightly over¬ 
samples, otherwise it optimistically predicts the rank of the 
next iteration by a larger over-sampling rate. By virtue of low¬ 
rankness of visual data shown in Fig. 2, the optimistic rule 
leads to an accurate estimate. 

2.4 Fast Random Projection 

We can replace the Gaussian random matrix with other less 
expensive and fast transforms according to the Johnson- 
Lindenstrauss lemma [27], such as a subsampled random 
Fourier transform, subsampled random Hadamard transform, 
or sparse Gaussian random matrix (in Matlab, sprandn() func¬ 
tion). The cost of the sparse Gaussian random matrix transform 
depends on the sparsity of the matrix, but it may become 
less accurate if the matrix is too sparse. The two structured 
subsampled random matrices can be multiplied with the cost 
0{mn log 1) by efficient computation methodology {e.g., FFT), 
while the cost of dense Gaussian matrix multiplication is 
0{mnl). In our implementation, we simply use dense Gaussian 
random matrices drawn from standard normal distribution in 
entry-wise. 

2.5 Computational Complexity 

The step-by-step computational complexities are summarized 
in Table 2. It does not consider the range propagation proce¬ 
dure for simplicity, but the range propagation only reduces the 
computational complexity. For example, while the computation 
of the sample matrix Y by random projection requires 0{mnl), 
we can reduce it to 0{mn) by the range propagation technique. 

Since Lines 3 and 4 {i.e., power iteration) in Table 2 iterates 7 
times, the overall algorithm in the case of m < n is much more 
efficient than the other way around. We ensure A to be m < n, 
if necessary by transposing it, at the beginning of the algorithm. 
The orthonormalization operation of Line 4 in Table 2 (Line 14 
in Alg. 1) is used for enhancing the numerical stability, but in 
practice it may not be necessary for every step and could be 
skipped except at the last step. However, in this work, we do 
not omit it but consistently adopt the orthonormalization every 
step as described in Alg. 1 for simplicity of evaluation. 


Line No. 

Operation 

Complexity 

1 

mxl mXn nxl 

O(mnl) 

2 

mxl 

^ = QR_CP(^'') 

mxs 

O(ms^) 

3 

mxs 

O(mns) 

4 

mX s mX s 

7^ = QR('V') 

O(ms^) 

5 

n X s nx^ mxs 

^ 

O(ms^) 

6 

nX s sX s nX s 

= QR(^) 

O(ns^) 

7 

Polar & Eigen Decomposition with C 

(Lines 17,18 in Alg. 1) 

0(?) 

8 

Composition (A) (Lines 19 in Alg. 1) 

O(mns) 


TABLE 2: Computational complexity of Alg. 1. For simplicity, we do not 
include the range propagation case, but it can be easily deduced by the 
above table because most of basic operations remain similar. Incorporating 
the range propagation obviously requires cheaper computation than the 
above algorithm. 

3 Analysis 

The proposed method provides an approximate solution of 
SVT; thus, a natural question is how accurate our algorithm 
is. This section analyzes this respect by first establishing the 
relationship between SVT and low-rank approximation. Then, 
we show that, under what conditions, our method can be ex¬ 
pected to be accurate, whereby a few design tips are described 
for potentially further improving the algorithm. Besides, we 
show the bound comparison between our method and a repre¬ 
sentative work, Linear-Time SVD (LTSVD) [ 6 ]. 

3.1 Relationship between SVT and iow-rank approxima¬ 
tion 

We can see the relationship between SVT and low-rank ap¬ 
proximation by comparing the low-rank approximation gap 
and SVT operation gap for two different matrices. We begin 
with reviewing some useful relationships. 

The SVT operator St-(-) is a proximal operator [1]; thus 
there exists its corresponding dual operator Pr(') such that 
X = St-(X) + Pt-(X). The dual relationship is valid by the 
following 2-norm ball Euclidean projection operator [3]. 

Definition 3 (2-norm ball Euclidean projection). For r > 0, 

consider SVD ofY = UyXyVy- Then the 2-norm hall Euclidean 
projection operator is defined by 

P,(Y) = UYPr(DY)V^, 

where Vr{x) = min(x, r) is an element-wise operation. 

Given the 2-norm ball Euclidean projection operator, we 
have the following lemma. 

Lemma 1. The 2-norm ball Euclidean projection operator is the 
closed-form optimal solution of the following problem: 

P,(Y)= min ||Y-X||^. 

||X|| 2 <r 

Proof. Since the proof is straightforward, we instead provide 
the idea of the proof. By von Neumann's inequality, the optimal 
solution of X should have the same singular vectors with Y. 
Since the / 2 -norm and the Frobenius norm are both unitary 
invariant, the problem is reduced to the diagonal matrix form. 
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Thus, the projection operator has an explicit form as shown in 
Definition 3. □ 

In order to derive a tight error bound, we need contraction 
inequality, called pseudo-contraction^ [37]: 

Proposition 2 (Pseudo-contraction [37]). Let A, B G and 

St-(-) he a proximal operator. Then, the following pseudo-contraction 
is satisfied. 

||S.(A)-S.(B)||| 

< ||A - B\\% - ||(S.(A) - A) - (S.(B) - B)||2^ 

= ||A-B|||.-||P.(B)-P.(A)|||.. 

The pseudo-contraction holds for all the proximal opera¬ 
tors. For deriving a spectral norm based error bound, we derive 
the spectral pseudo-contraction. 

Proposition 3 (Spectral Pseudo-contraction). Let A,B G 

j^mxn St-(-) be a proximal operator. There exists a constant 
1 < C <2 satisfying 

||S,(A)-S,(B)||2 

< C||A - B||2 - ||(S,(A) - A) - (S,(B) - B)||2. 

Proof. We have the following inequalities for St-(-) and Pr(-) 
from non-expansiveness of proximal operator [1]: 

||S,(A)-S,(B)||2<||A-B||2, 

||P,(A)-P,(B)||2<||A-B||2. 

By adding two inequalities, we have 

||S,(A) - S,(B)||2 < 2||A - B||2 - ||P.(A) - P,(B)||2, 

and then this implies there exists C <2. For the lowest value 
of C, using A = St-(A) + Pt-(A) and triangle inequality, we 
have 

||A-B||2 = ||S.(A) + P,(A)-(S,(B) + P,(B))||2 
< ||§,(A) -§,(B)||2 + ||P,(A) -P,(B)||2 

^||A-B||2 - ||P,(A)-P,(B)||2 < ||§.(A)-§,(B)||2, 

which implies C > 1. We can conclude there exists a constant 
C such that 1 < C < 2 satisfying Eq. (8). □ 

Since both pseudo-contractions involve the error term be¬ 
tween two projection operators, i.e., ||Pt-(A) — Pt-(B)||, the 
following lemma is useful to see the bound. 

Lemma 2. Let Ak he a rank-k approximation of A. Then, 

min||P^(A)-P^(Afc)|||. = min((Tj(A), r)^ 

min ||P^(A) - P^(A/c )||2 = min(c7fe+i(A), r). 

Proof. Let A^ = QQ* A = PqA, where Q G consists of 

k dominant orthonormal bases, and Range(Q) Range(A/c), 

^Refer to Lemma 3.3 of Pierra et al. [37] 


where A^ is the unique and exact mnk-k truncated matrix of 
A, and Pq = QQ* is an orthogonal projector. Then, 

||P,(A)-P,(Afe)||^ 

= ||P,(A)-P,(PqA)||2, 

= ||P,(UaSaVI) - P,(PqUaSaVI)||1. 

= ||UaPx(Sa)VX - PqUaPx(Sa)VX||?p 
(by the property of the unitary transform) 

= ||UaPx(Sa) - PqUaPx(Sa)||f 

(by the unitary invariant of the norm) 

= ||(I-Pq)UaP.(Sa)||X 

= ||P4UAPr(SA)||L 

Pq = I — Pq is the orthogonal projector onto the com¬ 
plementary subspace Range(PQ)^, which is also close to 
Range(A/c)^. Thus, the minimum is achieved only when 
Range(PQ) = Range(A/e) or Range(PQ)^ = Range(A/e)^ 
by the well known variational characterization. Therefore, 

min||PX^UAPT(SA)||F = 

The spectral norm case can be derived similarly. □ 

Other than the randomization step in sampling Y, our 
method is based on exact algorithms. Hence, the accuracy 
is only affected by the randomized range estimation, which 
computes a rank-/c approximation of a matrix. Since there is 
only one approximation step, the proposed method shares the 
same theorems with Halko et al. [12] as: 

Theorem 1 (Average Frobenius error bounds by randomiza¬ 
tion [12]). Let Ak he a rank-k approximation matrix A G 
For a target rank k >2 and an over-sampling parameter p>2, 
where k < min(m, n), draw a standard Gaussian matrix 
fl G Then, the theoretical minimum error and the upper 

bound satisfy 

'^j>k^ - AfcllF < poly(v) • (t|(A), 

where E denotes expectation with respect to the random matrix, 
poly(v) is a function for v = {k^p} without the power iteration 
orv = T]} with the power iteration. 

Halko et al. [12] show that the upper bound of the error is 
close to the theoretical minimum error with high probability 
in conjunction with some improving techniques {e.g., over- 
sampling and power iteration), and the bound is rather pes¬ 
simistic. 

In Theorem 1, in the case without power iteration, poly(-) 
is defined as: 

poly(v = {fc,p}) = (l + fc/(p-1)). (9) 

The polynomial bound with power iteration in the Frobenius 
norm representation has no simple form [12]; therefore, instead 
we observe the error bound with power iteration by referring 
to the spectral bound in the following Theorem 2 and Corol¬ 
lary 1. 

Theorem 2 (Average spectral error bound by randomization 
with the power scheme [12]). Let Aj^ he a rank-k approximation 
matrix A G For a target rank k >2 and an over-sampling 
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parameter p > 2, where k < min(m, n), draw a standard Gaus¬ 
sian matrix ft G Then, the theoretical minimum error 

and the upper bound satisfy 


cTk+i < IE||A - Ak 


< 


277 + 1 

^fe + 1 




2(277 + 1) I 


2r? + l 


( 10 ) 


Kj>k “ J 

where p denotes the number of power iterations. 

Corollary 1 (Loose average spectral error bound [12]). Theorem 
2 is bounded as 

E||A — Ak \\2 < [Bound of Theorem 2] < poly(v) • a/c+i, (11) 

where poly(v) = [l + f^+ ■ \/h-k\^ , v = 

{/c,p, T]}, and h = min(m, n). 

Corollary 1 is only useful for understanding the bound 
behavior rather than identifying the tight bound, because it 
is far broader than Theorem 2. However, it shows that, as the 
number of power iterations p increases, poly(v) approaches 1 
exponentially fast, and when poly(v) ^ 1 , the spectral error is 
bounded by the theoretical minimum a/c+i. 

For our method, we can derive the following Frobenius 
error bound for the approximate SVT {i.e., FRSVT) using Theo¬ 
rem 1, Proposition 2 and Lemma 2. 

Theorem 3 (Average error bound of the approximate SVT). 

Let St-(') be the SVT operator and Ak holds Theorem 1. The average 
error satisfies the following inequality: 


E||S,(A)-S,(Afc: 


< poly(v) • 


where G{A) = X] min(crj(A), r)^ > 0. 
Proof By Proposition 2 and Lemma 2, 
ll^r(A) — St-(A/j;) |||. < ||A — A/j;|||. — \\Fj- 
< IIA — Ak\\‘F — ■ 

By taking the expectation of both sides, 

II 


(A)-P+Ak)||^ 
min(<jj(A), r)^. 


'j>k 


Epr{A)-^r{Ak] 

< E||A - AkWp - min+j (A), rf 

< poly(v) • 


□ 


^j>k 

(by Theorem 1) 

= poly(v) • (E,>fe+"(A)) - 

Obviously, G(A) > 0 by definition. 

As aforementioned, poly(v) has a simple form of Eq. (9) 
only when the power iteration is not taken into account. For 
the case with power iteration, it is useful to see the spectral 
error bound of the approximate SVT, which is our main result. 

Theorem 4 (Average spectral error bound of the approximate 
SVT with the power scheme). Let be the SVT operator and 

Ak holds Theorem 2. The average spectral error satisfies the following 
inequality: 

E||S+A)-S+Afe )||2 < 

1 

277+1 

-G'(A), 


c 


277+1 

^/c+1 


/k-\-p 




j>k 


2(2j7+1)^ 


(12) 


where 1 < C <2, and G'{A) = min(cr/c+i(A), r) > 0. 

Proof We can simply derive Eq. (12) in a similar manner to 
Theorem 3, but instead by using Theorem 2 and Proposition 3. 

□ 


Corollary 2 (Loose average spectral error bound of the 
approximate SVT with power scheme). Theorem 4 is bounded 
as 


E||St-(A) — St-(A/c )||2 < [Bound of Theorem 4] 

< C • poly(v) • cTfe+i - (^'(A), 


where Y={k,p,r]}, poly{'v)= 1 + f^i + 

/i=min( 77 i, n), 1<G<2 and G'{A) = min(cr/c+i(A), r) > 0. 


277 + 1 


Proof Since aj > aj^i for any j, the term ^ 

in Eq. (12) is upper-bounded by \/h — k ■ crll\\ Then, re- 
expressing the inequality leads the conclusion. □ 


Consequently, both of Theorems 3 and 4 assert that 
the bounds of the approximate SVT are tighter than the 
respective errors by mnk-k approximation in Theorems 1 and 
2 on average. Thus, we have the following properties from 
Theorems 3 and 4: 

• Once cr/c+i(A) approaches a small value, then 
||St(A) — St(A/c)|| may approach close to zero, i.e., which 
indicates the approximation is almost exact, roughly when 
rank(A) < k. Detailed analyses of how we can exploit this 
property in the algorithm are described in Sec. 3.2. 

• Fortunately, the fact that the bound becomes rapidly tighter 
as p or 77 increases in both practice and theory remains 
consistent with the result of Halko et al. [12]. 

• The bound is independent of the size of the input matrix. 


3.2 Residual Singular Value Check without Singular Value 
Computation 

In the previous section, we state that our algorithm may pro¬ 
duce almost exact results when k > rank(A), i.e., (Jk-\-i = 0. We 
may have a broader condition for ensuring accurate estimates, 
because the bounds in Theorems 3 and 4 are tighter than those 
for low-rank approximation in Theorems 1 and 2, respectively. 
In this section, we explain the condition in which the proposed 
method can be expected to be accurate, and how the condition 
can be verified in a computationally inexpensive manner. 

Given the target rank k and an approximated matrix A 
by the range approximation, if cr/c(A) < r, then no accuracy 
improvement is possible. This is because all the remaining 
singular values {aj{A)}j^k are zeroed out by thresholding 
operation with r, and these does not affect the solution any 
further. Unless (Jk{A) < r, we can additionally estimate a 
few more bases as described in Sec. 2 . 1 , and we may check 
the condition again until a desired accuracy is achieved. This 
procedure can be adopted with Gram-Schmidt procedure [9] in 
Line 10 of Alg. 1 when an accurate estimate is required. 

While the complexity of the single additional basis estima¬ 
tion only affects the factor of s in Table 2 by s ^ s + 1, which 
still preserves the original complexity, checking a/e (A) requires 
additional computation because it is unavailable until the end 
of the FRSVT algorithm. Instead of directly estimating ak{A), 
we can utilize the following useful lemma. 
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Lemma 3 ([42]). Let A G and let qhe a positive integer, a 

he a real number greater than 1. By drawing an independent family 
: i = 1, 2,g} of standard Gaussian vectors, then, 

||A||2 < afl ||||2 (13) 

with probability at least 1 — a~^. 

The lemma illustrates the relationship between the true 
largest singular value and the estimate by Gaussian random 
projection. From this, we derive the following proposition. 

Proposition 4 (Residual singular value estimate). Let Q be any 
rank-k orthonormal column matrix. Given Q, the {k^l)-th singular 
value is bounded by (j/c+i(A) < 11(1-QQ* )A|| 2 . Then, drawing 
a standard Gaussian vector uj, 

11(1 - QQ^Alb < afiWd - QQ*)Aa;||2, (14) 

with probability at least 1 — a~^. 

Proof From the well known variational characterization of 
singular values, we can obtain the minimum of||(I — QQ*)A ||2 
with Q*Q = I (orthonormal column matrix) iff Q = Uj^, 
where U/^ is the singular vector matrix of A corresponding 
to the top-k largest singular values. Thus, 

11(1 - QQ*)A||2 > ||(I - UU*)A||2 = a,+i(A). 

By drawing a single standard Gaussian random vector uj, the 
left side of the above inequality can be further upper-bounded 
by Lemma 3 as Eq. (14) with probability at least 1 — a~^. This 
concludes the proof. □ 

Since y = Auj in Eq. (14) is the same by-product 
with the over-sampling during range propagation procedure 
in Sec. 2.1, we can check the provable upper bound esti¬ 
mate of {k + l)-th singular value cr/c+i(A) (< <j/c+i(A)) by 

= ayTlIy “ QQ*yll2 with high probability (say 95% 
with a = 20), where the residual vector computation y — QQ*y 
is also a by-product of the Gram-Schmidt procedure. Notice 
that all the computation for estimating the quantity has 
been already performed in the original FRSVT procedure; 
therefore, no additional computation is required except for 

a scaling with which is almost negligible. Thus, by 

checking < r, we are able to avoid unnecessary additional 
sampling. 

3.3 Bound Comparison with Linear-Time SVD 

We show the error bound comparison with a relevant method 
by showing the approximation error bound of Drineas et al. [6]. 

Theorem 5 (Average error bound of LTSVD [6]). Let 

A G R’^x’^ and be a matrix created by the LTSVD algorithm [6] 
by sampling c columns of A with probabilities such that 

Pi > P/llA|||./or some positive P < I and e > 0. If 

c > Akj, then the average Frobenius error is bounded by 

E||A - UkUjAWl = E||A - AkWl < ||A - A^Hl, + e||A|||.. 

In addition, the average spectral error is bounded by 

E||A - HfcHjAlb = E||A - A^lb < ^HA - Afe||i + e||A||2„ 

For a fair comparison, we set the same parameters when 
comparing the low-rank approximation bounds of Drineas et 


al. and Halko et al. [12], i.e., the over-sampling rate to be the 
same as the specified target rank k. 

Corollary 3. Suppose that the sampling rate c is set twice larger 
than the target rank k, i.e., c = 2k, then e > a/2, and LTSVD also 
holds the following bounds: 

E||A-Afe||| <^.^^ct|(A) + \/2^.^^ a|(A), (15) 

1 

E||A-Afc||2 < (ct|+i(A) + \/2^.^^ a|(A)j .(16) 

Proof. From Theorem 5, we have < 1, e > 0 and c > Ak/Pe^. 
Hence, e > 2^^. The lowest bound can be achieved at /3 = 1; 

therefore, if c = 2/c is assumed, e > = V2. In addition, 

we can obtain Eqs. 15 and 16 from the following bounds by 
using the above result with Theorem 5. 

E||A-A,||2,< ||A-A,||| + \/2||A|||. 

The average spectral bound can be derived analogously. □ 

From Corollary 3 and Theorem 1, we obtain the following 
conclusion. 

Proposition 5. Suppose the target rank k > 4 and the over- 
sampling rate to be k (for Halko et ed. [121, p = k, and for LTSVD, 
c = 2k), then on average Halko et al. [121 without power iteration 
has a tighter bound than LTSVD. 

Proof. We compare the right bound terms of Theorem 1 and 
Corollary 3 in Eq. (15). 

[LTSVD] (A) + V2 a2(A). (17) 

[Halko et al. ] poly(v) • J2j>k (A). (18) 

Suppose that Eq. (17) < Eq. (18). 

min(m,n) 

E <t2(A) + ^/2 E <t 2(A) < poly(v) ■ E <^|(A) 

j>k j=l j>k 

min(m,n) 

^ -^E^|(A) + y2 E a|(A)<0 

j>k j=l 

=> E2 E <^|(A) + C{k) • E <t2(A) < 0 (19) 

l<j<k j>k 

(c(fc) = E2-^) 

Eor k > d, the minimum of C{k) is achieved at /c = 4 as 
C(4) = a/2 — I > 0. Thus, since C{k) > 0 for k > d, the 
left term of Eq. (19) is positive for k > 4. This contradicts 
the assumption that Eq. (17) < Eq. (18), which concludes the 
proof. □ 

Eor simplicity, we make a reasonable assumption, the target 
rank k > 4. Since the bound comparison is performed without 
the power iteration scheme, Halko et al. [12] may have a loose 
bound. However, as shown in Proposition 5, Halko et al. [12] 
still result in a better approximation bound than LTSVD even 
under a non-ideal condition. Accompanied with the power 
iteration, we may be able to derive broader conditions that 
Halko et al. [12] perform better than LTSVD in a similar manner 
to the proof process of Proposition 5. We would like note that 
LTSVD is still fascinating in that it does not require dense 
matrix multiplications for Gaussian matrix sketching used in 
Halko et al. [12] by computing the approximate leveraging 
score {pi} which is computationally less expensive. 
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Fig. 4: SVT comparisons among SVD methods. [Top] Experiments on Matlab 2010a. [Bottom] Experiments on Matlab 2014a. Accuracy is measured 
by normalized root mean squared error (NRMSE), ||A* - Afc||F/||A*||ir, where A* = SrCDor), = Sr(Dfc), D^t is the input data, and is 
approximated by each method. For the low-rank matrix test, we generate input data matrices whose rank correspond to the target rank by multiplying two 
Gaussian random matrices with m x r and r x n, while in other tests full rank matrices are used. In (b,d,h,j), the theoretical minimum error bound by rank 
truncation in Sr(Dfc) is provided for guidance, and it is defined as J2j>k ^^(A*), where cri(A*) is computed by Matlab built-in SVD. For the low-rank case 
in (f,l), the theoretical minimum error is zero, so we omit the theoretical bound. 


4 Experimental Results 

In this section, we first evaluate the efficiency of the proposed 
method in comparison with other methods using simulation 
data. The evaluation is conducted by examining the perfor¬ 
mance of a single SVT computation and also by RPCA com¬ 
putation, which is arguably the most relevant NNM problem 
in computer vision today We then show NNM applications 
in computer vision using real-world data; subspace clustering, 
semi-online weather artifact removal, and simultaneous multi¬ 
image alignment and rectification. All the experiments are con¬ 
ducted on a PC with Intel i7-3.4GHz and 16GB RAM. The same 
shared parameters were used among different algorithms. 

4.1 Evaluation using Simuiation Data 

We quantitatively evaluate our method in comparison to other 
methods with synthetic matrices sampled from a standard 
Gaussian distribution. Computational times are evaluated us¬ 
ing Matlab 2010a and 2014a 64bits. Since the recent Matlab 
has been intensively optimized on Intel CPU (mainly due to 
improvement of Intel MKL), the computation efficiency has 
been noticeably improved. Therefore, it is worth reporting 
the performance difference on these two versions of Mat¬ 
lab, because most related works have been assessed with 
Matlab older than 2011a [3], [20], [22], [23], [29]. For a fair 
comparison, we turn off multi-threading functions including 
maxNumCompThreads (1) in Matlab. We describe the imple¬ 
mentation details of each method in Table 3. 

Single SVT test Figure 4 compares speed and accuracy of 
SVT computation using various SVT methods, such as Matlab 
built-in SVD^ (baseline), Lanczos [17], FSVT [3], LTSVD [6] and 
our FRSVT (with / without range propagation (RP hereafter)). 
All the implementation details are summarized in Table 3. 
Except for the baseline SVD, the others produce the truncated 
SVD of the input matrix, and it is used for SVT computation. 

^We apply either the economic size or full SVD depending on the matrix 
shape and report the faster one, but we denote either as econSVD. 


We test using Gaussian random matrix drawn from ^"(0,1) 
for the operation [•] with r = 1/>/||A|| 2 . For a rank- 
k approximation, we compute rank-(/c -f p) approximations 
by setting the over-sampling rate p to 2 for Lanczos, k for 
LTSVD, 5 for FRSVT.^ We apply the power iteration twice, i.e., 
Tj = 2. While LTSVD in Fig. 4-(g), (i), (1) is faster than ours, 
the approximation error is significantly higher than any other 
truncated SVDs, and it results in slower convergence in RPCA 
as we will see in the following test. 

Robust PCA test To observe the convergence behavior of 
SVT methods in RPCA [4], we compare our method with 
various SVT methods, such as SVD, LTSVD, BLWS [20], 
FSVT, RSVD^, using an inexact augmented Lagrange multiplier 
method [18] (iALM, or called alternating directional multi¬ 
plier method), and other variants, such as Mu et al. [29]^, 
Active Subspace [22], Li filtering [23] in Table 5 for resultant 
performance. Also, the continuous convergence behaviors are 
shown in Figs. 5 (over the number of iteration) and 6 (over 
computation times).^ We generate data by following [35] with 
10% rank ratio and 5% corruption ratio. We set the trade¬ 
off parameter A = ^ ^ as suggested by Candes et 

al. [4] for all the methods. All other parameters for FRSVT 
are same as mentioned in Sec. 2 and with Table 4. We apply 
the proposed adaptive rank prediction described in Sec. 2.3 
to LTSVD, RSVD and our FRSVT. LTSVD shows convergence 

^The rationals behind the used parameters are that Lanczos is an almost 
exact method, LTSVD produces quite low accuracy despite high over- 
sampling rate, and FRSVT is an approximate method fairly more accurate 
than LTSVD even with low over-sampling rate. 

^As noted in Table 3, this implementation involves 2-side random projec¬ 
tion and respective power iterations. By this, we can see the effects of the 
different implementation of [12]. 

®For Mu et al. [29], we could not reproduce their result. However, we 
include the results for thoroughness. The results are consistent with the 
early reported results in Liu et al. [23]. 

^The convergence of Li filtering is not compared, because the algorithm 
solves RPCA by dividing several block matrices of which convergence is 
not comparable to other methods. 
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(a) 2000 X 2000 (b) 4000 x 4000 (c) 6000 x 6000 (d) 10000 x 100 

Fig. 5: RPCA comparisons. X-axis: Number of iterations, Y-axis: Stopping criterion. Except Mu et al. is based on the exact ALM method, all the other 
methods are based on the inexact ALM. 


SVDjviatlab 2010a 

Built-in SVD(-, ' econ') is based on Intel MKF 10.2.2 and 
FAPACK 3.2.1. 

SVDMatlab 2014a 

Built-in SVD(-, ' econ' ) is based on FAPACK 3.4.1 in Intel 
MKF 11.0.5. 

LTSVD’^ 

The Matlab implementation by Ma et al. [26] is used. 

Fanezos 

The implementation optimized by Lin et al. [17] is used. It 
is implemented by Matlab and Mex to use FAPACK (ver. 
3.4.1 in Intel MKF 11.0.5 is used). 

BFWS 

The Matlab implementation by Lin et al. [20] is used. 

Mu et al. 

The Matlab implementation provided by Mu et al. [29] is 
used. All the parameters are directly used as suggested. 

Active Subspace 

The Matlab implementation provided by Liu et al. [22] is 
used. All the parameters are directly used as suggested. 

Li filtering 

The Matlab implementation provided by Liu et al. [23] is 
used. All the parameters are directly used as suggested. 

FSVT 

We implement FSVT by Matlab and Mex based on Intel 
MKF 11.0.5, and the speed-up gain is checked and con¬ 
sistent with the reported results in Cai et al. [3]. All the 
parameters are directly used as suggested. 

RSVD 

We implement RSVD [12] based on FAPACK 3.4.1 in Intel 
MKF 11.0.5. by referring to RedSVD=^^ 

FRSVT 

Our Matlab implementation based on Intel MKF 11.0.5. Par¬ 
tially, we implement Mex functions for QR-CP (dgeqp3) 
and Polar Decomposition with BFAS and FAPACK 3.4.1 
routines. 


* LTSVD: In the implementation of [26], the leverage score estimation suggested by 
the original paper [6] is omitted, which is required for the importance sampling. So 
we implement it with 0(mn) complexity. 

** RedSVD (http://code.google.eom/p/redsvd/): Since RedSVD is implemented 
based on Eigen 3.0-bl, without the power iteration but with 2-side random projection, 
so the original implementation is not fast and inaccurate. Thus, we re-implement 
RSVD [12] by referring to RedSVD with 2-side power iteration. 

Pqj. FSVT, RSVD and FRSVT implementation, we utilize Armadillo library [38] as a 
high level wrapper for FAPACK and BFAS of Intel MKF in our Mex implementation 
part. In the Mex functions of Fanezos, Fin et al. [17] directly call FAPACK routines 
without any external library. 

TABLE 3: Lists of comparison methods and details for SVT tests. All the 
parallelization techniques are turned off for fair comparison. 


iALM]\/[ethods 

econSVD 

LTSVD 

BFWS 

FSVT 

RSVD 

FRSVT 

FRSVT-RP 

P 

- 

+5 

- 

- 

+2 

+2 

+2 

AP 

X 

O 

o 

X 

O 

O 

O 

PI 

- 

- 

- 

- 

2 

2 

2 


TABLE 4: Parameter settings for convergence evaluation of RPCA. (p: Over- 
sampling rate, AP: Adaptive rank prediction, PI: The number of power 
iterations rj.) Mu et al. [29], Active Subspace [22] and Li-filtering [23] are 
not compatible with the parameters. 

degradation when achieving comparable accuracy with others 
due to rough approximation on both bases and singular values. 
Other methods including our FRSVT show similar numbers 
of iterations and accuracy, but our method has a considerably 
lower computation time for a single iteration. 

Fig. 7 shows iALM behavior over varying over-sampling or 
power iteration settings. In Fig. 7-(a), varying over-sampling 
rate p is tested, where we fix the number of power iterations 




Fig. 6: Convergence time plots of RPCA. X-axis: Elapsed time (sec.), Y-axis: 
Stopping criterion. Except Mu et al. is based on the exact ALM method, 
all the other methods are based on the inexact ALM. As a representative 
example, matrices of size 6000 x 6000 are used, but the results can be 
generalized to other size matrices. 

T] = 2 without adopting AP. Without AP, even in the case of 
p = 2, which has been used as our standard experiment setting, 
iALMs with both FRSVT and RSVD do not converge to the 
ground truth. This indicates that AP is necessary for a fixed 
over-sampling rate scheme. Fig. 7-(b) shows the results with 
the varying number of power iterations 77 , where we fix p = 2 
with AP. The result implies that a single power iteration is 
sufficient in practice. 

4.2 Applications 

This section shows applications of our FRSVT method to 
various types of low-rank optimization problems summarized 
in Table 6 , such as affine constrained NNM (Type A), non- 
convex truncated NNM (Type B) and NNM on tensor structure 
(Type C). We show them with typical computer vision applica¬ 
tions in what follows. 

Type A - Robust Subspace Clustering by Low-Rank Repre¬ 
sentation (LRR) Many visual data are often characterized by 
a mixture of multiple subspaces. One of the recent promising 
methods is LRR, which effectively performs subspace clus¬ 
tering and noise correction simultaneously. While both noise 
correction and subspace clustering are known to be challeng¬ 
ing, robust LRR has been shown effective even with large 
corruptions. The robust LRR can be formulated by Type A, 
which can be efficiently solved by iALM. In this experiment. 
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iALMgconSVD 

Matlab 2010a Matlab 2014a 

iALMLTSVD 

iALMBLWS 

Mu et al. 

iALMpsvr 

iALMRsvD 

iALMpRsvT-RP 

(Proposed) 

Active Sub. 

Li filter 


IISIIo 

10000 X 100 

99,999 


879,415 

99,999 

999,999 

99,999 

99,999 

99,999 

132,666 

231,604 

2000 X 2000 

399,997 


994,479 

399,998 

3,998,401 

399,997 

399,993 

399,997 

399,991 

402,932 

4000 X 4000 

1,599,981 


5,607,211 

1,599,984 

15,995,318 

1,599,981 

1,599,979 

1,599,981 

1,599,968 

1,663,440 

6000 X 6000 

- 

3,599,952 

35,662,548 

3,599,966 

35,990,553 

3,599,952 

3,599,955 

3,599,952 

3,599,938 

3,752,782 


Accuracy 

10000 X 100 

7.34e-05 


3.39e-02 

2.66e-04 

7.65e-01 

7.34e-05 

6.75e-05 

9.33e-05 

3.00e-02 

5.16e-02 

2000 X 2000 

2.11e-07 


7.77e-07 

1.03e-06 

l.lOe+00 

2.11e-07 

3.06e-07 

2.11e-07 

5.70e-07 

3.61e-07 

4000 X 4000 

1.81e-07 


4.59e-07 

4.83e-07 

1.37e+00 

1.81e-07 

2.08e-07 

1.81e-07 

3.56e-07 

2.80e-07 

6000 X 6000 

- 

1.42e-07 

7.76e-05 

3.83e-07 

1.49e+00 

1.42e-07 

1.71e-07 

1.43e-07 

2.89e-07 

2.08e-07 


Total Time 

10000 X 100 

3.217 

1.982 

1.722 

1.317 

125.368 

2.436 

0.863 

0.782 

1.349 

3.179 

2000 X 2000 

2751.910 

147.055 

11.075 

9.808 

118.798 

348.132 

7.928 

5.123 

11.163 

94.300 

4000 X 4000 

21668.280 

1068.863 

69.547 

53.468 

759.794 

2527.141 

49.027 

30.274 

79.471 

683.159 

6000 X 6000 

- 

3360.379 

203.051 

164.752 

2385.272 

8238.109 

147.139 

88.132 

246.738 

1914.431 


Avg. time for a single iteration 

10000 X 100 

0.140 

0.086 

0.039 

0.049 

1.687 

0.104 

0.038 

0.034 

0.047 

- 

2000 X 2000 

119.415 

6.393 

0.274 

0.393 

1.667 

15.189 

0.345 

0.226 

0.465 

- 

4000 X 4000 

947.908 

46.481 

1.726 

2.185 

10.664 

110.006 

2.132 

1.348 

3.311 

- 

6000 X 6000 

- 

146.192 

4.295 

6.848 

33.505 

358.899 

6.397 

3.942 

10.281 

- 


Total no. iteration 

10000 X 100 

23 


43 

24 

73 

23 

23 

23 

29 

- 

2000 X 2000 

23 


41 

24 

70 

23 

23 

23 

24 

- 

4000 X 4000 

23 


41 

24 

70 

23 

23 

23 

24 

- 

6000 X 6000 

- 

23 

48 

24 

70 

23 

23 

23 

24 

- 


Speed-up gain against the baseline 

10000 X 100 

- 

1.62X 

1.87X 

2.44 X 

0.03 X 

1.32 X 

3.73 X 

4.11 X 

2.38 X 

1.01 X 

2000 X 2000 

- 

18.71 X 

248.48 X 

280.58 X 

23.16 X 

7.90 X 

347.11 X 

537.17 X 

246.52 X 

29.18 X 

4000 X 4000 

- 

20.27 X 

311.56 X 

405.26 X 

28.52 X 

8.57 X 

441.97 X 

715.74 X 

272.66 X 

31.72 X 

6000 X 6000 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 


TABLE 5: Quantitative comparison on RPCA. The accuracy is measured by NRMSE with the ground-truth data, and the speed-up gain is computed against 
the baseline method, iALMeconsvD on Matlab 2010a. For Mu etal. [29], we used the implementation and parameters provided by the authors. Since Mu et 
al. is based on the exact ALM algorithm [18], while other methods are based on the inexact ALM [18], we report the times for a single outer iteration of 
Mu et al. as ‘Avg. time for a single iteration. In addition, the highly stochastic property of the randomly projected nuclear norm makes the algorithm [29] 
sensitive to data and does not preserve the singular values and sparsity structure of the matrix. This result is consistent with another stochastic method, 
LTSVD (see and compare ||S||o of LTSVD and Mu et al.). Also, since Li filtering is incompatible to compare with other methods, the time for a single 
iteration and the total number of iteration are not reported. 



■ALM-econSVD 
iALM-FRSVT-WS [p=0] 
iALM-FRSVT-WS [p=2] 
iALM-FRSVT-WS [p=6] 
iALM-FRSVT-WS [p=10] 
iALM-RSVD [p=0] 
iALM-RSVD [p=2] 
iALM-RSVD [p=6] 
iALM-RSVD [p=10] 


(a) Over-sampling 



Iterations 

(b) Power iter. 


Fig. 7: Varying parameter tests for (a) over-sampling and (b) power iteration 
in RPCA problem. X-axis: Number of iterations, Y-axis: Normalized root 
mean square error (NRMSE) of low-rank solution L as 


Type 

Objective function 

Constraint 

A 

argmin L L + A S 2,1 

L,S 

0 = ZL + S 

B 

argmin cri(L) + A||S||i 

0 = L + S 

C 

3 

argmin X] +A||£:||i 

jC,£,r i=i 

0 or = jC + s 


TABLE 6: Examples of NNM related objective functions. (A) Low-rank rep¬ 
resentation (LRR). (B) RPCA based on the non-convex truncated nuclear 
norm, where k is the target rank. (C) Low-rank and sparse 3-order tensor 
decomposition with alignment. Here, || • || 2 ,i is h,! norm, {a^} are balancing 
parameters among the unfolding matrices £(i), and = 1 is assumed. 
We refer the basic tensor algebra notations described in [44]. 


Computational Time (s) 



LRR+SVD 

LRR+SVD 

LRR+Ours 


Matlab 2010a 

Matlab 2014a 


Time per Motion 

1.230 

1.016 

0.452 

Time for 640 Faces 

419.440 

75.216 

44.590 


TABLE 7: Comparisons of the subspace segmentation algorithms on Hop- 
kins155 [40] (Motion) and Extended Yale Database B [16] (Face). Motion 
Segmentation Errors on the Hopkins155 of both LRR and LRR-i-Ours are 
1.59%. The segmentation accuracies (%) on the Yale data are 79.06 for 
both LRR and LRR-i-Ours. These results show the improvement of the 
computation time with retaining the same accuracy. 


we use the same parameter settings and evaluation metrics 
suggested in [21]. 

We apply FRSVT to the robust LRR for motion segmen¬ 
tation on Hopkinsl55 [40] and for face recognition on a part 
of Extended Yale Database B [16], respectively. All parameters 
for FRSVT are the same as mentioned in Sec. 2. We use all 
the 156 sequences in Hopkinsl55, and the first 10 classes in 
Extended Yale Database B, in which each class contains 64 
face images captured under various illumination conditions - 
given 42 x 48 resized images, we construct a 2016 x 640 data 
matrix by vectorizing each image. While Hopkins 155 is only 


weakly corrupted, Yale data is heavily corrupted by shadows, 
specularity and noise. As shown in Table 7, our method speeds 
up LRR without degrading of the accuracy. 

Type B - Semi-Online Weather Artifact Removal We con¬ 
sider a weather artifacts {e.g., snow or rain) removal problem 
in a video sequence. Since the weather artifacts have non- 
deterministic appearance and a sparse yet random distribution 
in the spatio-temporal domain, we model the artifacts as sparse 
outlier and the scene as low-rank, while we want to retain 
moving objects. We apply the low-rank and sparsity decompo¬ 
sition to n frames in a sliding window manner to leave moving 
objects in the latent images. 
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■ SVT ■ Update Z ■ Update E ■ Others 



0 20 40 60 80 

Computational Time (Seconds) 

Fig. 8: Computation time comparison on the subspace clustering applica¬ 
tion [21]. Computation times are measure(d for face clustering on Exten(je(d 
Yale Database B [16]. 


Fig. 9: Qualitative results for the weather artifact removal, (a) Sample input 
images O. (b) Low-rank images Z. 

Unfortunately, since finding low-rank solution by the nu¬ 
clear norm relies on the high dimensionality of data [4], it 
could be degraded when only with a small number of frames 
as reported in Oh et al. [30]. With the assumption that the 
object motions are small within a few frames, we can en¬ 
courage the low-rank solution to be rank-1 and decompose 
sparse corruptions by Type B optimization with k = 1, where 
O is constructed by stacking vectorized n images. It can be 
effectively solved by alternating Partial SVT (PSVT) [30] and 
li minimization based on iALM. We replace PSVT by FRSVT 
with the partial thresholding. We transfer the previous basis 
estimation to the next n frame optimization, so it can be 
regarded as a semi-online algorithm. 

We set the trade-off parameter A = , ^ = as suggested 

^ ymax(m,n) 

by Oh et al. [30], and limit the maximum rank-/ for computing 
partial SVD in FRSVT as: / = 2 when n = 3, and / = 3 when 
n = 5 and 10. We also apply the power iteration twice, i.e., 
Tj = 2. Our algorithm produces n results simultaneously for n 
images. With an n = 5 sliding window, the method based on 
SVD takes 158.9ms per a single channel of a 384x288 image 
on Matlab 2014a (318.3ms on Matlab 2010a), while our method 
only takes 65.5ms. The qualitative results of our method can be 
found in Figs. 9 and 10, which show plausible snow removal. 

Type C - Simultaneous Multi-Image Alignment and Rectifi¬ 
cation Simultaneous multi-image alignment and rectification 
problem is Type C optimization suggested by Zhang et al. [44] 
(called SRALT). The method combines the ideas of TILT [45] 
and RASL [44] on the third-order tensor structure, which 
exploits the low-rank texture property and nuclear norm-based 
misalignment error. We show the applicability of FRSVT to 
NNM on the third-order tensor. We first show that SRALT with 
FRSVT produces consistent results with the original SRALT on 


Windows dataset [36] in Fig. 12. Then, we present a new appli¬ 
cation of SRALT using gait data. We use the parameters ol and 
A in Type-C of Table 6 as; A = « = [0.1,0.1,0.8] 

for Windows, and a = [0.15, 0.15, 0.7] for the gait data. 

Since gait is a biometric signal spanned on not only spatial 
domain (2D), but also temporal domain, so it is natural to rep¬ 
resent the data by a third-order tensor [25]. In gait recognition, 
the exact alignment of acquired data is frequently assumed, 
but in real situations, the accuracy of the alignment varies 
depending on the pivot angle of the camera and locations of 
moving people. Therefore, the SRALT framework is useful to 
align gait data making sure that the assumption holds and to 
fmd a vertical angle as a preprocessing step. We observe that 
the human silhouettes have the low-rank texture property, and 
as observed in Lu et al. [25], cyclic gait motions are spanned by 
a few number of bases. 

We conduct the experiment using the Gait Challenge data 
set [39], called USF Human ID version 1.7. There are 731 gait 
samples, each with ten sampled gait cycles, and we use the 
first 300 gait samples. We resize the images into 32 x 22, so the 
size of the input tensor is O G ]r32x22x3000 (3000 unaligned 
images with the size 22 x 32). The synthetic misalignment 
generation parameters are described in the below. Randomly 
drawn Euclidean (3 DoF) geometric transformations are ap¬ 
plied to each gait image. The parameters are set to: 1) Scale 
factor: fixed to 1, 2) Angle noise: (^)^)' Translation 

noise: A^(0, 0.5^) + 2 • (/7(0,1) — 0.5), where cr^) denotes 
Gaussian distribution with the mean /i and variance and 
U{a, b) denotes uniform distribution defined on [a, b]. 

The set of geometric parameters T = {^1, • • • ,^3000} (see 
Type C in Table 6) are defined with a p-group parameterization 
as Qj G where 3-DoF parameterization {i.e., p = 3) for 
Euclidean transformation is used in our experiments. Our 
method is implemented on top of [44] by replacing their SVT 
method with ERSVT. While the method of [44] converges 
at the objective value 206.69 and takes about 6 hours 51 
min. (24652s), our method takes only about 1 hour 45 min. 
(6280s) and achieves an even smaller objective score 206.46. 
The qualitative and detailed computation time comparisons are 
summarized in Eigs. 13 and 14, respectively. 

5 Discussions 

We have presented a fast approximate SVT method that ex¬ 
ploits the property of iterative NNM procedures by range 
propagation and adaptive rank prediction. The approximation 
error bound shows that our method can produce reliable 
approximation. The proposed method has been assessed using 
the problems of affine constrained NNM, non-convex NNM 
and NNM on tensor structure as well as the original NNM. The 
empirical evaluations showed the consistent result with the 
theoretical analysis, and our approach can reduce the computa¬ 
tional time of applications that involve low-rank optimization 
without degrading accuracy and convergence behavior. 

Eor convergence, while a general convergence analysis 
against various host problems and algorithms is very inter¬ 
esting, it is going to be challenging. RPCA and the Type- 
A application are convex programs, thus if the algorithms 
converge with satisfying the respective termination criteria 
(derived from the convergence criteria [18], [19]), then the 
algorithms may produce a solution close to global optimal 
within a small distance {i.e., up to e-optimality). At this point, 
we can only say that a host algorithm integrated with ERSVT 
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(b) (c) (d) (e) 


Fig. 10: Comparisons according to varying sliding window lengths, (a) Sampled input image, (b) The exact PSVT based result with the sliding window 
length 5. (c-e) The PSVT by FRSVT based results with the sliding window length {3,5,10} respectively. [Top] Low-rank images, L. [Bottom] Absolute of 
sparse images, abs(S). 


Computational Time (ms) 


Sliding Window n 

3 

5 7 10 

Elapsed Time 

71.5 

65.5 68.2 66.2 


Fig. 11: Computation times of the FRSVT based semi-online weather arti¬ 
fact removal. The elapsed times are measured for a single color channel. 
Since our algorithm produces n results simultaneously for n images at an 
execution, we report the time for a single image for a single channel. This 
results show that, when the number of inputs is small, our weather artifact 
removal algorithm has linear complexity according to the length of a sliding 
window, thus the elapsed times for a single image are similar. 
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(c) SRALT + FRSVT 


Fig. 12: Qualitative comparisons for the simultaneously alignment and rec¬ 
tification on Windows data [36]. The final solutions produce similar results 
while the computational speed is enhanced by FRSVT. 


converges to e-optimal, if and only if the problem is convex, 
the host algorithm converges and FRSVT yields reasonably 
accurate solutions. 

The current major computational bottleneck of our method 
is in the power iteration scheme. We are interested in further 
reducing the computational complexity by effectively relaxing 
this computation block in the future. 
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Fig. 13: Qualitative comparison of RASL [36] and SRALT [44] + Our FRSVT. [Top] Samples of input images. [Middle] Results obtained by RASL [36]. 
[Bottom] Results obtained by SRALT [44] + Our FRSVT. [Right side] Average images. (Top: for all input images, Middle: for RASL, Bottom: for SRALT+Ours). 
Due to the angular bias shown in (a), RASL aligns samples to be consistent at the biased angle, while the SRALT based method in (c) not only aligns 
images, but also corrects upright poses. 
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Fig. 14: Computation time comparison on registration and rectification 
application [44]. The NNM for tensor consists of 3-way SVTs. Computational 
times are measured for a single outer iteration (almost 100 inner iterations). 
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